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In recent times, fitness trackers and smartphones equipped with different 
sensors like gyroscopes, accelerometers, global positioning system sensors 
and programs are used for recognizing human activities. In this paper, the 
results collected from these devices are used to design a system that can 
assist an application in monitoring a person's health. The proposed system 
takes the raw sensor signals as input, preprocesses it and using machine 
learning techniques outputs the state of the user with minimum error. The 
objective of this paper is to compare the performance of different algorithms 
logistic regression (LR), support vector machine (SVM), k-nearest neighbor 
(k-NN) and random forest (RF). The algorithms are trained and tested with 
an original number of features as well as with transformed number of 
features (using linear discriminant analysis). The data with a smaller number 
of features is then used to visualize the high dimensional data. In this paper, 
each data point is mapped in the high dimensional data to two-dimensional 
data using t-distributed stochastic neighbor embedding technique. Overall, 
the first high dimensional data is visualized and compared with model’s 
performance with different algorithms and different number of coordinates. 
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1. INTRODUCTION 


Human activity recognition is a process where sequences of data collected using smartphones or 
smart-belts are classified using different techniques into movements. Several different fitness trackers 
incorporated with powerful sensors like acceleration, gyroscope, visual sensors are released in the market 
every year. The acquired signals are stored in real-time. Traditional devices used pedometers to take step 
counts. They are cheaper, however, devices that primarily use accelerometers are favored because they give 
more accurate results. Most devices use the accelerometer to keep track of activity and measure it in three 
orientations which can be used to estimate activities such as energy estimation, and energy intensity [1], [2]. 
In addition other sensors such as gyroscope, magnetometers are used to potentially improve the accuracy. 
Orientation and angular velocity estimated by gyroscope help in better prediction of human activity. Often 
accelerometers, gyroscopes and magnetometers are combined to form inertial measurement units (IMU) to 


give more accurate metrics [3]. 
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Human activity data plays an important role in analyzing the overall health of a patient and has 
fundamental utility in clinical research. Research shows that physical activity plays a substantial role in 
eliminating the risk for non-communicable diseases (NCD). For a long, physical activity has been studied in 
epidemiological research to map the relationship between health and human activity. Patients are now 
required to follow a definite exercise routine and their activities are tracked and analyzed with their health 
status. Such continuous monitoring has proved to improve the reliability of diagnosis and enhance patient’s 
quality of life. For a good analysis of human activity, it is extremely important to predict the type of activity 
for a given signal with good accuracy. 

Many approaches have been proposed in the task of classifying human activity. In this paper, the 
human activity recognition task is considered as a supervised problem. The proposed method uses all features 
for the task of prediction, for feature reduction, paring the data to only those features that enhance the 
prediction accuracy. Moreover, state-of-the-art did not give importance to data visualization. The proposed 
work strives to provide a visualization of high dimensional data to track and analyze how different activities 
are related to each other. This will help in understanding as to why and how it is difficult to separate out 
some activities from other. The paper is organized as follows: section 2 related work. Section 3 discusses the 
research method used. Section 4 explains linear discriminant analysis (LDA) with a detailed discussion of 
algorithms in section 5. Section 6 explains the results obtained with conclusive remarks in section 7. 


2. RELATED WORK 

This section discusses the work done so far and enumerates some of the classic models of the state- 
of-the-art. In a study [4], traditional algorithms such as naive Bayes, hidden Markov model (HMM), hidden 
semi Markov model (HSMM) and conditional random fields (CRFs) are compared with deep learning models 
on raw sensory data and validated that those deep learning models outperformed the best result by 40%. 
Similar deep learning approaches have been previously employed for recognition in [5], [6]. The concept of 
combining AdaBoost with other classifiers (C4.5, multilayer perceptron and logistic regression (LR)) was 
introduced in [7]. It was tested that Adaboost combined with C4.5 gave an accuracy of 94.04%. A similar 
technique with slight modification by combining AdaBoost algorithm with decision stump (DS), Hoeffding 
tree (HT), random tree (RT), J48, random forest (RF) and reduce error pruning (REP) Tree was discussed to 
classify six activities of daily life by using the Weka tool [8]. Bayat et al. [9] used a single triaxial 
accelerometer to obtain accurate recognition. Different activities of a person were analyzed using a 
classification model also using feature selection. The model was trained on different classifiers and finally, it 
was proposed that overall efficiency of 91.15% was obtained by taking the average of probabilities as a 
fusion method. Machine learning models including naive Bayes, support vector machine (SVM), Markov 
chains are employed for recognition of human activity [10], [11]. Liu et al. [12] proposed two methods, first 
activity recognition is performed using a machine learning model after performing feature extraction on the 
data collected using accelerometer and gyroscope and using convolution neural network model on raw data. 
The final result proposes that SVM performs the best among other methods and that accelerometer reading 
contribute more to recognition than gyro sensor reading. In study [13], a position independent method is 
proposed where first, raw data were converted into vertical and horizontal acceleration so as to avoid the 
influence of the orientation of smartphones on prediction. Bao and Intille [14] mentions the use of five 
biaxial accelerometers that are worn on the wrist, upper-arm, ankle, right hip and thigh and 20 types of 
activities for monitored and various data mean, entropy, energy and recorded, the model was trained using 
naive Bayes classifier, decision trees, C4.5 and instance-based learning. 

The work in [15] recognizes activities using a single triaxial accelerometer worn near the pelvic 
region. Eight sets of activities standing, walking, running, climbing up stairs, climbing down stairs, sit-ups, 
vacuuming, brushing teeth were performed by two subjects in multiple rounds over different days. The model 
was trained using level based classifiers-decision tables, decision trees, k-nearest neighbors (K-NN), SVM, 
naive Bayes. Plurality voting combining base-level classifiers outperformed other techniques. In a study [16] 
the approach uses legion: AR, a system for training an arbitrary activity recognition system in real-time using 
a crowd of workers. Kantoch and Kantoch [17] proposed a prototype of a body sensor network (BSN)-based 
wearable wireless monitoring system optimized to monitor patient's activity and physiological signals. 
Further, in study [18] a local space-time feature is proposed to represent the human movement observed in a 
video and recognize motion patterns with SVM classification schemes for recognition. A general kernel 
method can be employed for recognition with local features. In several study [19]-[21] focuses on group 
activities where three different approaches are used to model person-person interaction. By exploring 
person-person interaction in the feature level for which a new feature representation called action contact 
(AC) descriptor is proposed. The third approach combines the first two. In study [22], improvement to the 
model is proposed by incorporating high-dimensional features of duration and time block characteristics. 
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3. RESEARCH METHOD 

In this paper, the aim is to compare and analyze the performance of the system using different 
algorithms with different number of features. The input to the model is a set of pre-processed signals 
obtained using an accelerometer and gyroscope passed through various noise filters and finally converted to 
structured data and output is the activity performed while the signals were recorded. Since the data has 
widely varying dimensionality, t-distributed stochastic neighbor embedding is used to chart the high 
dimensional data to 2-D plots. Essentially, the main focus is on decreasing the number of features and doing 
a comparative analysis. LDA is performed several times simultaneously measuring the accuracy, analyzing 
how many features are concretely required to predict the activity correctly. 


3.1. Data 

The data consists of 10,299 observations, 561 frequencies and time-domain feature vector and an 
activity label. The data was collected from 30 people to wear waist-mounted smartphones and were asked to 
perform various static and dynamic activities and the movement data was recorded. The various activities 
consisted of laying, sitting, and standing, walking downstairs, and walking upstairs. The model is trained on 
different numbers of bits k for B estimation. Finally, it is demonstrated that the movement data (the features 
of the data) is based on 3 axial linear acceleration obtained using accelerometer and 3 axial angular velocities 
obtained using gyroscope. These sensor signals were passed through noise filters. Windowing approach is 
used for the segmentation of data. Further, signal frequency components were obtained by applying fast 
Fourier transform (FFT) [1]. Features set (for example: -mean standard deviation, entropy, skewness, signal 
magnitude area) therefore are summarized versions of those processed time-domain signals. For modelling 
the data was split as follows: i) training data: 7,352 and 11) test data: 2,947. 


32. Visualizing using T-SNE 

One of the best and easiest ways to understand the complicated relationship between the data is 
through data visualization. This is particularly important for high-dimensional data to communicate the 
findings. There are a variety of techniques that have been proposed for visualization of such high- 
dimensional data, like mosaic plots, parallel coordinate plots, projection pursuit and grand tour, trellis to 
display different kinds of data like purely categorical, purely continuous, mixed scaled data respectively [23]. 
A technique called t-SNE is used that pictures high dimensional data by drafting each data point in high 
dimension to low dimension i.e. two or three- dimensional map [24]. 

The objective of t-distributed stochastic neighbor embedding (t-SNE) is to visualize the data by 
reducing the dimensionality while keeping similar instances close and dissimilar instances apart thus just 
preserving local structure. In short, the t-SNE algorithm is a similarity measure algorithm between pair of 
instances in the low dimensional and low dimensional space. Perplexity is the number of nearest neighbors 
considered when matching original and fitted distribution for each data point. The perplexity is high like 50 
or 100 to take more of the big picture into account. From Figures 1 and 2 it is deduced that it is easy to 
distinguish static and dynamic activities, however, the issue is that both "standing" and "sitting" activities are 
quite similar to each other, hence it seems that there is relatively little difference between the position and 
movement of a "sitting" person and “standing” person. A solution to this problem is to use an algorithm to 
separate only “sitting” and “standing” observations. 
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Figure 1. Visualization using t-SNE 
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Figure 2. A closer view of static and dynamic activities indicating a difference in static and dynamic 
activities 


Further, another way of dealing with high dimensional data is to reduce the number of features by 
converting the high dimensional data set into low dimensional data which can then be displayed using a 
scatterplot. There are many proposed techniques for feature reduction like principle component analysis 
(PCA), multidimensional scaling (MDS), and locally linear embedding (LLE). In this paper, LDA is used. 


4. LINEAR DISCRIMINATE ANALYSIS 

LDA is a classification algorithm that learns the most discriminative axes between the classes during 
training, and these axes can then be used to define a hyperplane onto which data is projected. This algorithm 
is used to identify m-dimensional summary of the data from d-dimensional space such that within-class 
variance is minimized while keeping between classes variance maximized. This supervised subspace learning 
method projects features xi in the high dimensional space, where X=[x1, ..., Xn]JERdxnto WERdxm assuming 
that X has been centered with zero mean, i.e., 37. , x; = 0, i.e. 


arg arg max y tr ((WTS,W)-* « (WTS,W)) (1) 
Where Sw is within class and S» is between-class scatter matrix respectively. 

Sw = Èk=1 Xiec (Gti — uk) * (Xi — ue") (2) 

Sp = Dhar Ny (Ue — H) * Qu — 107) (3) 


Cx= index set of the k^ class, ux and nx are mean vector and size of k" class respectively. If S;=Sw+Sp., 
where S, is the total scatter matrix, then (2) and (3) can also be modified as (4). 
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Where, Sp = Xi Qo — 106a — WT (4) 


When the total class scatter matrix is singular, the solution to (4) consist of top eigenvectors of matrix S'S» 
corresponding to nonzero eigenvalues, where S", is the pseudo inverse of S; When the total class scatter 
matrix is non-singular, the solution to (4) consist of top eigenvectors of matrix S'',Sy corresponding to 
nonzero eigenvalues [25]. 

The following steps are used for feature reduction and classification algorithm. 

Step 1: First, the projection of the dataset is calculated using the LDA and a number of features are selected, 
which is called LDA transform. LDA transform is performed several times with different numbers 
of input features and accuracy is calculated each time and the number of features with the best 
average performance is chosen. 

Step 2: Different models are built on the training data on which LDA transformation is performed to find 
the optimum dimension. 

Step 3: Each model obtained after step 2 is used to predict classes in the validation data. 

Step 4: Misclassification error is calculated for each model and the model with the least error is used to 
predict classes in test data. 

Step 5: The model is then further optimized by tuning hyper-parameters based on the result obtained after 
the performance on the validation data. 


5. ALGORIMHS 

This section explains the preliminaries used in the proposed framework. For brevity, only basic 
details of logistic regression, k-nearest neighbor, support vector machine and random forest are presented. 
The specific details of these algorithms, which are used during empirical evaluation of the proposed 
methodology is also given. 


5.1. Logistic regression (LR) 

Logistic regression algorithm is used with one-vs-rest (OvR) scheme. This is a parametric method 
which is trained for each class to predict whether that query belongs to a particular class or not. It follows the 
assumption that all classes are independent of each other. LDA transform is performed with a different 
number of dimensions. It has been observed that performance is almost the same for five dimensions and 
more. 


5.2. k-nearest neighbor (k-NN) 

k-NN classifier [26] is a nonparametric technique. It gives the probability of a data point belonging 
to a particular class B based on the probability of its K neighbors' probability of belonging to the same class 
B. The model is trained for different values of k in the range of the expected k-value and accuracy is 
calculated each time. The best result is obtained at K=9. Also, it is observed that the model performs best 
when the number of components is set to 6. 


5.3. Support vector machine (SVM) 

Support vector machine [27] takes the data points and outputs a hyperplane that best separates the 
classes. In the SVM model, data points are mapped so that the data points belonging to different categories 
are divided by a clear gap that is as wide as possible. SVM is one of the most effective algorithms for high 
dimensional data. One vs all method with a number of coordinates equal to five is implemented. 


5.4. Random forest (RF) 

Random forest [28] operated by selecting multiple bootstrap samples from the original dataset. In 
addition to being effective classifiers, this approach can be used for dimensionality reduction by constructing 
trees against a target attribute. The approach is to construct a classification tree ensemble in which LDA is 
employed for feature selection. 


6. RESULTS AND DISCUSSION 

After implementing each algorithm on the original and LDA transformed data, the performance of 
the proposed model is evaluated using a confusion matrix, learning curves and classification accuracy. Based 
on the confusion matrix as shown in Figures 3 to 6, it is observed that some activities are difficult to predict 
than others. For example, ‘sitting’ classes are usually misclassified as ‘standing’, it seems that both ‘sitting’ 
and 'standing' classes have relatively the same accelerometry and gyroscopic pattern. 
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Figure 5. Learning curve and confusion matrix for SVM 
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Figure 6. Learning curve and confusion matrix for random forest 


Analyzing the learning curves, the trend is almost the same for all the algorithms, the proposed 
algorithm portrays that the model's performance is proportional to the amount of data that is fed into the 
system. F1 score in Table 1 is another way for evaluating the empirical performance of algorithms. It can be 
seen that it is easy to classify ‘Lay’ from other activities. This is quite obvious from the visualization of data 
in Figure 1. 

Based on the F1 score in Table 1, accuracy scores in Table 2, it is deducted that SVM and RF 
perform better after applying LDA transformation. Surprisingly, LR and KNN performance did not improve 
with a smaller number of features as compared to other algorithms in Table 2, hence it can be concluded that 
feature reduction does not have much impact on the score improvement when predicting using the above 
mentioned algorithms. 


Table 1. F1 scores by class for basic model data and LDA enabled and transformed data 
using different algorithms 

Features Algo Sit Stand Lay Walk Wu WD 
ORIGINAL LR 0.93 0.92 1.00 0.97 0.98 0.96 
KNN 0.88 0.85 1.00 0.91 0.86 0.89 
SVM 0.91 0.93 1.00 0.96 0.94 0.95 
RF 0.91 0.90 1.00 0.93 0.89 0.90 
LDA LR 0.91 0.93 1.00 0.98 0.97 0.99 
KNN 0.91 0.93 1.00 0.99 0.98 0.99 
SVM 0.92 0.92 1.00 0.99 0.98 0.99 
RF 0.91 0.92 1.00 0.94 0.92 0.96 


Table 2. Comparative scores basic vs. LDA enabled model for Logistic regression, k-nearest neighbor, 
support vector machine and random forest algorithms 


Algorithm Accuracy scores (Basic model) Accuracy scores (LDA-Enabled) 
Logistic regression 0.950 0.963 
K-nearest neighbor 0.949 0.96 
Support vector machine 0.900 0.967 
Random forest 0.923 0.964 


The precision score of different algorithms after LDA transformation are shown in Figures 7 and 8. 
It is analyses the performance capability of each algorithm in classifying each class correctly. It is observed 
that random forest is able to classify “sitting” and “standing” activity with the highest precision but lags in 
predicting “walk” and “walking downstairs” activity. On the other hand SVM predicts “walk” and “walk 
downstairs” activity with the highest precision but lags in predicting “sit and “stand” activity. KNN 
algorithms can predict “sit” and “walk” with high precision. LR performs the worst compared to other 
algorithms. All the algorithms are able to predict “lay” class with 1.00 precision. 
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7. CONCLUSION AND FUTURE WORK 

In this paper, human activity recognition using optimized number of features is used with different 
machine learning algorithms. For this t-distributed stochastic neighbor embedding technique is used and 
results are established. The role of feature reduction in decreasing the misclassification error rate by reducing 
variance due to high feature space is studied and analyzed. The proposed work also emphasizes on data 
visualization to understand the problem in hand. The proposed methodology obtained 6% less 
misclassification error using SVM on LDA transformed data when compared with original data. This is 
because the sparsity of the data is reduced by reducing the number of features that now are less likely to 
overfit. The algorithm that worked best is the SVM classifier and Random forest with LDA transformation. It 
is observed that class ‘Laying’ was correctly classified every time, this is mainly due to the difference in the 
signals of this activity as compared to others. The proposed method deducts that it is easier to distinguish 
static activities (stand, sit, lay) from dynamic activities (Walk, WU, WD). However, the majority of the 
misclassification error is due to certain activity labels which are closely related to each other such as ‘stand’, 
‘sit’ and ‘walking’, ‘walking downstairs’ resulting in a high error rate for these classifications. In the future 
work on implementing an algorithm to separate such classes can be investigated more deeply. Also, the issue 
of variance is reduced but not completely resolved. The proposed work can be extended on reducing the 
problem of variance by using different techniques such as the introduction of bias, collecting more data, 
including regularization parameters. 
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